“The distinct property of DRL programs is learning through trial and error from feedback that’s simultaneously sequential, evaluative, and sampled by leveraging powerful non-linear function approximation.”
“Traditionally, any piece of software that displays cognitive abilities such as perception, search, planning, and learning is considered part of AI.”
“There are three main branches of ML: supervised, unsupervised, and reinforcement learning.”
(“Grokking deep reinforcement learning Ebook.pdf”, p. 5)
“A powerful recent approach to ML, called deep learning (DL), involves using multi-layered non-linear function approximation, typically neural networks.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 5)
control theory (CT) : studies ways to control complex known dynamic systems.
operations research(OR): Studies decision-making under uncertainty and have much larger action spaces then commonly seen in DRL. (Ex. Financial engineering, policy modeling and public sector work)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 6)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 7)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 8)
Transition function are map from (State,Action) to new state.
Reward function are map from (State,Action) to reward.
The set of Transition and reward function are called Model.
(“Grokking deep reinforcement learning Ebook.pdf”, p. 9)
“The agent can be designed to learn mappings from observations to actions called policies.”
“The agent can be designed to learn the model of the environment on mappings called models.”
“The agent can be designed to learn to estimate the reward-to-go on mappings called value functions.”
“At each time step, the agent observes the environment, takes action, and receives a new observation and reward. The set of the state, the action, the reward, and the new state is called an experience.”
(“Grokking deep reinforcement learning Ebook.pdf”, p. 10)
“The task the agent is trying to solve may or may not have a natural ending. Tasks that have a natural ending, such as a game, are called episodic tasks. Conversely, tasks that don’t are called continuing tasks, such as learning forward motion. The sequence of time steps from the beginning to the end of an episodic task is called an episode.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 10)
“Sequential feedback gives rise to a problem referred to as the temporal credit assignment problem. The temporal credit assignment problem is the challenge of determining which state and/or action is responsible for a reward.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 11)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 11)
“Evaluative feedback gives rise to the need for exploration.”
“This is also referred to as the exploration versus exploitation trade-off.”
(“Grokking deep reinforcement learning Ebook.pdf”, p. 12)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 13)
“Agents that are designed to approximate policies are called policy-based; agents that are designed to approximate value functions are called value-based; agents that are designed to approximate models are called model-based; and agents that are designed to approximate both policies and value functions are called actor-critic. Agents can be designed to approximate one or more of these components.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 13)
“Neural networks aren’t necessarily the best solution to every problem; neural networks are data hungry and challenging to interpret, and you must keep these facts in mind. However, neural networks are among the most potent function approximations available, and their performance is often the best.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 14)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 14)
“Alan Turing’s work in the 1930s, 1940s, and 1950s that paved the way for modern computer science and AI by laying down critical theoretical foundations that later scientists leveraged” (“Grokking deep reinforcement learning Ebook.pdf”, p. 15)
“Turing Test,” (“Grokking deep reinforcement learning Ebook.pdf”, p. 15)
“The formal beginnings of AI as an academic discipline can be attributed to John McCarthy”
Coined the term “Artificial intelligence”.
Leading the first AI conference in 1956.
Inventing the Lisp programming language in 1958.
Co-founding the MIT AI Lab in 1959.
(“Grokking deep reinforcement learning Ebook.pdf”, p. 16)
“Things got worse when a well-known researcher named James Lighthill compiled a report criticizing the state of academic research in AI” (“Grokking deep reinforcement learning Ebook.pdf”, p. 15)
“All of these developments contributed to a long period of reduced funding and interest in AI research known as the first AI winter.”
“reduced funding by government and industry partners.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 16)
“We are likely in another highly optimistic time in AI history, so we must be careful.”
“Edsger W. Dijkstra famously said: “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.””
“Companies such as Google, Facebook, Microsoft, Amazon, and Apple have invested in AI research and have become highly profitable thanks, in part, to AI systems.”
“best computing power available”
“tremendous amounts of data”
“Current AI research has become more stable and more productive.”
“The use of artificial neural networks for RL problems started around the 1990s.”
“TD-Gammon was one of the first widely reported success stories using ANNs to solve complex RL problems.”
(“Grokking deep reinforcement learning Ebook.pdf”, p. 17)
“In 2004, Andrew Ng et al. developed an autonomous helicopter”
“inverse reinforcement learning,”
“Nate Kohl and Peter Stone”
“policy-gradient methods”
“There were other successes in the 2000s, but the field of DRL really only started growing after the DL field took off around 2010. In 2013 and 2015, Mnih et al. published a couple of papers presenting the DQN algorithm. DQN learned to play Atari games from raw pixels. Using a convolutional neural network (CNN) and a single set of hyperparameters, DQN performed better than a professional human player in 22 out of 49 games.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 18)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 18)
“This accomplishment started a revolution in the DRL community: In 2014, Silver et al. released the deterministic policy gradient (DPG) algorithm, and a year later Lillicrap et al. improved it with deep deterministic policy gradient (DDPG). In 2016, Schulman et al. released trust region policy optimization (TRPO) and generalized advantage estimation (GAE) methods, Sergey Levine et al. published Guided Policy Search (GPS), and Silver et al. demoed AlphaGo. The following year, Silver et al. demonstrated AlphaZero. Many other algorithms were released during these years: double deep Q-networks (DDQN), prioritized experience replay (PER), proximal policy optimization (PPO), actor-critic with experience replay (ACER), asynchronous advantage actor-critic (A3C), advantage actor-critic (A2C), actor-critic using Kronecker-factored trust region (ACKTR), Rainbow, Unicorn (these are actual names, BTW), and so on. In 2019, Oriol Vinyals et al. showed the AlphaStar agent beat professional players at the game of StarCraft II. And a few months later, Jakub Pachocki et al. saw their team of Dota-2-playing bots, called Five, become the first AI to beat the world champions in an e-sports game.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 18) Read this
“Thanks to the progress in DRL, we’ve gone in two decades from solving backgammon, with its 1020 perfect-information states, to solving the game of Go, with its 10170 perfectinformation states, or better yet, to solving StarCraft II, with its 10270 imperfect-information states. It’s hard to conceive a better time to enter the field. Can you imagine what the next two decades will bring us? Will you be part of it? DRL is a booming field, and I expect its rate of progress to continue.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 19)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 20) At the beginning around 1750s when Industrial Revolution happened. People think it will destroy all the jobs. 100 years after the industrial revolution, the long-term effect of these changes we are benefiting communities. The digital revolution started in 1970s with the introduction of personal computer. Then Internet change the way we do things.
“I expect in a few decades humans won’t even need to work for food, clothing, or shelter because these things will be automatically produced by AI. We’ll thrive with abundance.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 20)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 20)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 21)
“You could formulate any ML problem as a DRL problem, but this isn’t always a good idea” (“Grokking deep reinforcement learning Ebook.pdf”, p. 22)
“What are the pros and cons?” (“Grokking deep reinforcement learning Ebook.pdf”, p. 22)
“letting the machine take control” (“Grokking deep reinforcement learning Ebook.pdf”, p. 22)
“training from scratch every time can be daunting, time consuming, and resource intensive. However, there are a couple of areas that study how to bootstrap previously acquired knowledge. First, there’s transfer learning, which is about transferring knowledge gained in tasks to new ones.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 23)
“if you want to teach a robot to use a hammer and a screwdriver, you could reuse low-level actions learned on the “pick up the hammer” task and apply this knowledge to start learning the “pick up the screwdriver” task. This should make intuitive sense to you, because humans don’t have to relearn low-level motions each time they learn a new task. Humans seem to form hierarchies of actions as we learn. The field of hierarchical reinforcement learning tries to replicate this in DRL agents.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 23)
“DRL is about mastering specific tasks. Unlike SL, in which generalization is the goal,” (“Grokking deep reinforcement learning Ebook.pdf”, p. 23)
“Deep reinforcement learning’s weaknesses Of course, DRL isn’t perfect. One of the most significant issues you’ll find is that in most problems, agents need millions of samples to learn well-performing policies. Humans, on the other hand, can learn from a few interactions. Sample efficiency is probably one of the top areas of DRL that could use improvements. We’ll touch on this topic in several chapters because it’s a crucial one.” (“Grokking deep reinforcement learning Ebook.pdf”, p. 23)
(“Grokking deep reinforcement learning Ebook.pdf”, p. 24)
“Should the reward be as dense as possible, which makes learning faster, or as sparse as possible, which makes the solutions more exciting and unique?” (“Grokking deep reinforcement learning Ebook.pdf”, p. 24)
“There’s ongoing interesting research on reward signals. One I’m particularly interested in is called intrinsic motivation.”
“I don’t want people to think that from this book, they’ll be able to come up with a trading agent that will make them rich.”
“The fact is that learning will come from the combination of me putting in the effort to make concepts understandable and you putting in the effort to understand them.”
“. In chapters 3 through 7, you learn about agents that can learn from sequential and evaluative feedback, first in isolation, and then in interplay”
“In chapters 8 through 12, you dive into core DRL algorithms, methods, and techniques.”
“Chapters 1 and 2 are about introductory concepts applicable to DRL in general,”
“first part (chapters 3 through 7) is for you to understand “tabular” RL. That is, RL problems that can be exhaustively sampled, problems in which there’s no need for neural networks or function approximation of any kind.”
“Chapter 3 is about the sequential aspect of RL and the temporal credit assignment problem.”
“evaluative feedback and the exploration versus exploitation trade-off in chapter 4”
“In chapter 5, you study agents that learn to estimate the results of fixed behavior”
“Chapter 6 deals with learning to improve behavior, and chapter 7 shows you techniques that make RL more effective and efficient.”
“the second part (chapters 8 through 12) is for you to grasp the details of core DRL algorithms”
“In chapters 8 through 10, we go deep into value-based DRL.”
“In chapter 11, you learn about policybased DRL and actor-critic”
“chapter 12 is about deterministic policy gradient (DPG) methods, soft actor-critic (SAC) and proximal policy optimization (PPO) methods.”
(“Grokking deep reinforcement learning Ebook.pdf”, p. 26)
“I found PyTorch to be a “Pythonic” library” (“Grokking deep reinforcement learning Ebook.pdf”, p. 28)
“The code is written in Python, and I make heavy use of NumPy and PyTorch. I chose PyTorch, instead of Keras, or TensorFlow, because I found PyTorch to be a “Pythonic” library” (“Grokking deep reinforcement learning Ebook.pdf”, p. 28)